Suppose you want to by \(x\) shares of TGT stock. - Your broker charges you $5 for the transaction. - TGT is currently selling for $64.96 per share.
\[ y = 5 + 64.96 x. \]
A linear relationship between an explanatory variable \(x\) and a response variable \(y\) can be estimated by a regression line:
\[ \hat{y} = b_0 + b_1 x \]
We use the symbol \(\hat{y}\) to emphasize that this is a predicted value of \(y\).
Researchers captured 104 of brushtail possums and took body measurements before releasing the animals back into the wild. We consider two of these measurements: the total length of each possum, from head to tail, and the length of each possum’s head.
The equation of the regression line for predicting Head Length from Total Length is \(\hat{y} = 41 + 0.59x\).
Use the regression model to predict the Head Length of a possum whose Total Length is 76.0 cm.
One of the possums in the sample had a Total Length of 76.0 cm and a Head Length of 85.84 mm. Did the model prediction overestimate or underestimate the Head Length? By how much?
Residuals are the leftover variation in the data after accounting for the model fit.
Everyone (especially the Operator): open the following applet:
https://www.rossmanchance.com/applets/2021/regshuffle/regshuffle.htm
Click the Show Regression Line checkbox and record the equation of the regression line.
Click Show Residuals. Find the largest negative residual and click on its point in the scatterplot. Record the value of the residual.
Delete this row by clicking the Delete button that appeared when you did part 2. How did this affect the slope of the regression line?
The correlation coefficient \(r\) describes the strength and direction of a linear relationship. - \(r\) is a number between -1 and 1. - Everyone uses software to calculate \(r\). - The sign of \(r\) tells you the direction of the relationship.
| range of \(r\) | Strength | Meaning |
|---|---|---|
| \(0.7 \leq \lvert r \rvert \leq 1\) | Strong | Points almost form a line. |
| \(0.3 \leq \lvert r \rvert \leq 0.7\) | Moderate | Clear pattern, but bloblike. |
| \(0.1 \leq \lvert r \rvert \leq 0.3\) | Weak | Slight pattern. |
| \(0 \leq \lvert r \rvert \leq 0.1\) | None | No discernible trend. |
A: \(r = -0.54\)
B: \(r = 0.16\)
C: \(r = 0.46\)
D: \(r = -0.44\)
E: \(r = 0.69\)
F: \(r = 0.85\)
Return to the correlation applet of the previous group exercise.
Reload the applet to get it back to the original state.
Check on Show Regression Line and Correlation coefficient. Record the value of \(r\).
Now try deleting points from the scatterplot and see if you can get the value of \(r\) to be greater than 0.85. How many points did you have to delete?
Rows: 50
Columns: 3
$ family_income <dbl> 92.922, 0.250, 53.092, 50.200, 137.613, 47.957, 113.534, 168.579, 208.115, 1…
$ gift_aid <dbl> 21.720, 27.470, 27.750, 27.220, 18.000, 18.520, 13.000, 13.000, 14.000, 25.4…
$ price_paid <dbl> 14.280, 8.530, 14.250, 8.780, 24.000, 23.480, 23.000, 29.000, 28.000, 16.530…
Call:
lm(formula = gift_aid ~ family_income, data = elmhurst)
Residuals:
Min 1Q Median 3Q Max
-10.1128 -3.6234 -0.2161 3.1587 11.5707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.31933 1.29145 18.831 < 2e-16 ***
family_income -0.04307 0.01081 -3.985 0.000229 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.783 on 48 degrees of freedom
Multiple R-squared: 0.2486, Adjusted R-squared: 0.2329
F-statistic: 15.88 on 1 and 48 DF, p-value: 0.0002289
\[ \widehat{\texttt{gift_aid}} = 24.3 - 0.0431 \times \texttt{family_income} \]
\[ \widehat{\texttt{gift_aid}} = 24.3 - 0.0431 \times \texttt{family_income} \]
family_income, the amount of gift_aid
decreases by about \(\$43\).\[ \widehat{\texttt{gift_aid}} = 24.3 - 0.0431 \times \texttt{family_income} \]
1
20.8736
The predicted gift_aid for a family_income
of \(\$80,000\) is about \(\$20,873\).
\[ \widehat{\texttt{gift_aid}} = 24.3 - 0.0431 \times \texttt{family_income} \]
1
-104.8956
The predicted gift_aid for a family_income
of \(\$3,000,000\) is negative!
gift_aid can be explained by
the linear relationship with family_income.\[ r= \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i -\bar{y}}{s_y} \right) \]
\[ \begin{align} \hat{y} &= a + bx && \text{regression line} \\ b &= r \frac{s_y}{s_x} && \text{sample slope} \\ a &= \bar{y} - b\bar{x} && \text{sample $y$-intercept} \\ \end{align} \]